Segmentation of DNA into Coding and Noncoding Regions Based on Recursive Entropic Segmentation and Stop-Codon Statistics
نویسندگان
چکیده
Heterogeneous DNA sequences can be partitioned into homogeneous domains that are comprised of the four nucleotides A, C, G, and T and the stop codons. Recursively, we apply a new entropic segmentation method on DNA sequences using Jensen-Shannon and Jensen-Rényi divergences in order to find the borders between coding and noncoding DNA regions. We have chosen 12and 18-symbol alphabets that capture (i) the differential nucleotide composition in codons and (ii) the differential stop-codon composition along all the three phases in both strands of the DNA. The new segmentation method is based on the Jensen-Rényi divergence measure, nucleotide statistics, and stop-codon statistics in both DNA strands. The recursive segmentation process requires no prior training on known datasets. Consequently, for three entire genomes of bacteria, we find that the use of nucleotide composition, stop-codon composition, and Jensen-Rényi divergence improve the accuracy of finding the borders between coding and noncoding regions in DNA sequences.
منابع مشابه
Finding borders between coding and noncoding DNA regions by an entropic segmentation method.
We present a new computational approach to finding borders between coding and noncoding DNA. This approach has two features: (i) DNA sequences are described by a 12-letter alphabet that captures the differential base composition at each codon position, and (ii) the search for the borders is carried out by means of an entropic segmentation method which uses only the general statistical propertie...
متن کاملApplications of Recursive Segmentation to the Analysis of DNA Sequences
Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G + C)/weak(A + T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicatin...
متن کاملSegmentation of short human exons based on spectral features of double curves
This paper presents a new segmentation method based on spectral analysis to locate borders between short protein coding regions and non-coding regions. We formulate the innovative double curve representation of a DNA sequence and apply local three-codon measurement on the discrete Fourier spectral features at 1/3 frequency to identify short protein coding regions. The proposed spectral segmenta...
متن کاملDivergence Measures for Dna Segmentation
Entropy-based divergence measures have shown promising results in many areas of engineering and image processing. In this study, we use the Jensen-Shannon and Jensen-Rényi divergence measures for DNA segmentation. Based on these information theoretic measures and protein shape coded in DNA, we propose a new approach to the problem of finding the borders between coding and noncoding DNA regions....
متن کاملA Pixon-based Image Segmentation Method Considering Textural Characteristics of Image
Image segmentation is an essential and critical process in image processing and pattern recognition. In this paper we proposed a textured-based method to segment an input image into regions. In our method an entropy-based textured map of image is extracted, followed by an histogram equalization step to discriminate different regions. Then with the aim of eliminating unnecessary details and achi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- EURASIP J. Adv. Sig. Proc.
دوره 2004 شماره
صفحات -
تاریخ انتشار 2004